negative case
Pre-Training Meta-Rule Selection Policy for Visual Generative Abductive Learning
Jin, Yu, Liu, Jingming, Luo, Zhexu, Peng, Yifei, Qin, Ziang, Dai, Wang-Zhou, Ding, Yao-Xiang, Zhou, Kun
Visual generative abductive learning studies jointly training symbol-grounded neural visual generator and inducing logic rules from data, such that after learning, the visual generation process is guided by the induced logic rules. A major challenge for this task is to reduce the time cost of logic abduction during learning, an essential step when the logic symbol set is large and the logic rule to induce is complicated. To address this challenge, we propose a pre-training method for obtaining meta-rule selection policy for the recently proposed visual generative learning approach AbdGen [Peng et al., 2023], aiming at significantly reducing the candidate meta-rule set and pruning the search space. The selection model is built based on the embedding representation of both symbol grounding of cases and meta-rules, which can be effectively integrated with both neural model and logic reasoning system. The pre-training process is done on pure symbol data, not involving symbol grounding learning of raw visual inputs, making the entire learning process low-cost. An additional interesting observation is that the selection policy can rectify symbol grounding errors unseen during pre-training, which is resulted from the memorization ability of attention mechanism and the relative stability of symbolic patterns. Experimental results show that our method is able to effectively address the meta-rule selection problem for visual abduction, boosting the efficiency of visual generative abductive learning.
AI and the Law: Evaluating ChatGPT's Performance in Legal Classification
The use of ChatGPT to analyze and classify evidence in criminal proceedings has been a topic of ongoing discussion. However, to the best of our knowledge, this issue has not been studied in the context of the Polish language. This study addresses this research gap by evaluating the effectiveness of ChatGPT in classifying legal cases under the Polish Penal Code. The results show excellent binary classification accuracy, with all positive and negative cases correctly categorized. In addition, a qualitative evaluation confirms that the legal basis provided for each case, along with the relevant legal content, was appropriate. The results obtained suggest that ChatGPT can effectively analyze and classify evidence while applying the appropriate legal rules. In conclusion, ChatGPT has the potential to assist interested parties in the analysis of evidence and serve as a valuable legal resource for individuals with less experience or knowledge in this area.
- Europe > Poland > Pomerania Province > Gdańsk (0.05)
- North America > United States > Ohio (0.04)
- North America > United States > Minnesota (0.04)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Bias Correction in Machine Learning-based Classification of Rare Events
Gubbels, Luuk, Puts, Marco, Daas, Piet
Online platform businesses can be identified by using web-scraped texts. This is a classification problem that combines elements of natural language processing and rare event detection. Because online platforms are rare, accurately identifying them with Machine Learning algorithms is challenging. Here, we describe the development of a Machine Learning-based text classification approach that reduces the number of false positives as much as possible. It greatly reduces the bias in the estimates obtained by using calibrated probabilities and ensembles.
- North America > United States > New Jersey (0.05)
- Europe > Netherlands > North Brabant > Eindhoven (0.05)
- Europe > France > Île-de-France > Paris > Paris (0.05)
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?
Jin, Yiping, Wanner, Leo, Shvets, Alexander
Online hate detection suffers from biases incurred in data sampling, annotation, and model pre-training. Therefore, measuring the averaged performance over all examples in held-out test data is inadequate. Instead, we must identify specific model weaknesses and be informed when it is more likely to fail. A recent proposal in this direction is HateCheck, a suite for testing fine-grained model functionalities on synthesized data generated using templates of the kind "You are just a [slur] to me." However, despite enabling more detailed diagnostic insights, the HateCheck test cases are often generic and have simplistic sentence structures that do not match the real-world data. To address this limitation, we propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch by instructing large language models (LLMs). We employ an additional natural language inference (NLI) model to verify the generations. Crowd-sourced annotation demonstrates that the generated test cases are of high quality. Using the new functional tests, we can uncover model weaknesses that would be overlooked using the original HateCheck dataset.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
- (13 more...)
Generating by Understanding: Neural Visual Generation with Logical Symbol Groundings
Peng, Yifei, Jin, Yu, Luo, Zhexu, Ding, Yao-Xiang, Dai, Wang-Zhou, Ren, Zhong, Zhou, Kun
Despite the great success of neural visual generative models in recent years, integrating them with strong symbolic reasoning systems remains a challenging task. There are two levels of symbol grounding problems among the core challenges: the first is symbol assignment, i.e. mapping latent factors of neural visual generators to semantic-meaningful symbolic factors from the reasoning systems by learning from limited labeled data. The second is rule learning, i.e. learning new rules that govern the generative process to enhance the symbolic reasoning systems. To deal with these two problems, we propose a neurosymbolic learning approach, Abductive visual Generation (AbdGen), for integrating logic programming systems with neural visual generative models based on the abductive learning framework. To achieve reliable and efficient symbol grounding, the quantized abduction method is introduced for generating abduction proposals by the nearest-neighbor lookup within semantic codebooks. To achieve precise rule learning, the contrastive meta-abduction method is proposed to eliminate wrong rules with positive cases and avoid less informative rules with negative cases simultaneously. Experimental results show that compared to the baseline approaches, AbdGen requires significantly less labeled data for symbol assignment. Furthermore, AbdGen can effectively learn underlying logical generative rules from data, which is out of the capability of existing approaches.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Statistical Design and Analysis for Robust Machine Learning: A Case Study from COVID-19
Pigoli, Davide, Baker, Kieran, Budd, Jobie, Butler, Lorraine, Coppock, Harry, Egglestone, Sabrina, Gilmour, Steven G., Holmes, Chris, Hurley, David, Jersakova, Radka, Kiskin, Ivan, Koutra, Vasiliki, Mellor, Jonathon, Nicholson, George, Packham, Joe, Patel, Selina, Payne, Richard, Roberts, Stephen J., Schuller, Björn W., Tendero-Cañadas, Ana, Thornley, Tracey, Titcomb, Alexander
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
- Europe > United Kingdom > England > Greater London > London (0.06)
- (7 more...)
Subtyping patients with chronic disease using longitudinal BMI patterns
Mottalib, Md Mozaharul, Jones-Smith, Jessica C, Sheridan, Bethany, Beheshti, Rahmatollah
Obesity is a major health problem, increasing the risk of various major chronic diseases, such as diabetes, cancer, and stroke. While the role of obesity identified by cross-sectional BMI recordings has been heavily studied, the role of BMI trajectories is much less explored. In this study, we use a machine-learning approach to subtype individuals' risk of developing 18 major chronic diseases by using their BMI trajectories extracted from a large and geographically diverse EHR dataset capturing the health status of around two million individuals for a period of six years. We define nine new interpretable and evidence-based variables based on the BMI trajectories to cluster the patients into subgroups using the k-means clustering method. We thoroughly review each cluster's characteristics in terms of demographic, socioeconomic, and physiological measurement variables to specify the distinct properties of the patients in the clusters. In our experiments, the direct relationship of obesity with diabetes, hypertension, Alzheimer's, and dementia has been re-established and distinct clusters with specific characteristics for several of the chronic diseases have been found to be conforming or complementary to the existing body of knowledge.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Delaware > New Castle County > Newark (0.14)
- Europe > Middle East > Malta > Northern Region > Western District > Attard (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- (3 more...)
Towards a responsible machine learning approach to identify forced labor in fisheries
Joo, Rocío, McDonald, Gavin, Miller, Nathan, Kroodsma, David, Farthing, Courtney, Belhabib, Dyhia, Hochberg, Timothy
Many fishing vessels use forced labor, but identifying vessels that engage in this practice is challenging because few are regularly inspected. We developed a positive-unlabeled learning algorithm using vessel characteristics and movement patterns to estimate an upper bound of the number of positive cases of forced labor, with the goal of helping make accurate, responsible, and fair decisions. 89% of the reported cases of forced labor were correctly classified as positive (recall) while 98% of the vessels certified as having decent working conditions were correctly classified as negative. The recall was high for vessels from different regions using different gears, except for trawlers. We found that as much as ~28% of vessels may operate using forced labor, with the fraction much higher in squid jiggers and longlines. This model could inform risk-based port inspections as part of a broader monitoring, control, and surveillance regime to reduce forced labor. * Translated versions of the English title and abstract are available in five languages in S1 Text: Spanish, French, Simplified Chinese, Traditional Chinese, and Indonesian.
- North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Taiwan (0.04)
- (19 more...)
- Law > Labor & Employment Law (1.00)
- Food & Agriculture > Fishing (1.00)
Detecting of a Patient's Condition From Clinical Narratives Using Natural Language Representation
Le, Thanh-Dung, Noumeir, Rita, Rambaud, Jerome, Sans, Guillaume, Jouvet, Philippe
The rapid progress in clinical data management systems and artificial intelligence approaches enable the era of personalized medicine. Intensive care units (ICUs) are the ideal clinical research environment for such development because they collect many clinical data and are highly computerized environments. We designed a retrospective clinical study on a prospective ICU database using clinical natural language to help in the early diagnosis of heart failure in critically ill children. The methodology consisted of empirical experiments of a learning algorithm to learn the hidden interpretation and presentation of the French clinical note data. This study included 1386 patients' clinical notes with 5444 single lines of notes. There were 1941 positive cases (36 % of total) and 3503 negative cases classified by two independent physicians using a standardized approach. The multilayer perceptron neural network outperforms other discriminative and generative classifiers. Consequently, the proposed framework yields an overall classification performance with 89 % accuracy, 88 % recall, and 89 % precision. This study successfully applied learning representation and machine learning algorithms to detect heart failure from clinical natural language in a single French institution. Further work is needed to use the same methodology in other institutions and other languages.
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
FBI: Fingerprinting models with Benign Inputs
Maho, Thibault, Furon, Teddy, Merrer, Erwan Le
Recent advances in the fingerprinting of deep neural networks detect instances of models, placed in a black-box interaction scheme. Inputs used by the fingerprinting protocols are specifically crafted for each precise model to be checked for. While efficient in such a scenario, this nevertheless results in a lack of guarantee after a mere modification (like retraining, quantization) of a model. This paper tackles the challenges to propose i) fingerprinting schemes that are resilient to significant modifications of the models, by generalizing to the notion of model families and their variants, ii) an extension of the fingerprinting task encompassing scenarios where one wants to fingerprint not only a precise model (previously referred to as a detection task) but also to identify which model family is in the black-box (identification task). We achieve both goals by demonstrating that benign inputs, that are unmodified images, for instance, are sufficient material for both tasks. We leverage an information-theoretic scheme for the identification task. We devise a greedy discrimination algorithm for the detection task. Both approaches are experimentally validated over an unprecedented set of more than 1,000 networks.